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© A single-instruction multiple-data processor (10) 
has an input layer especially designed for high data 
input and output rates. The processor (10) has a 
number of processing elements (20), each corre- 
sponding to incoming data samples. The processing 
elements (20) are interleaved so that a set of sam- 
ples can be input in parallel. This configuration is 
especially useful for scan rate conversation, where 
the data output rate may not necessarily be the 
same as the data input rate. The processor (10) is 
also programmable, which makes it especially useful 
for digital filtering. Near-neighbor communications 
among processing elements (20) realize the delays 
required for horizontal filtering. 
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TECHNICAL FIELD OF THE INVENTION 

This invention generally relates to single in- 
struction, multiple data processors, and more par- 
ticularly to a such a processor that is program- 
mable and achieves a high processing throughput 
speed. 

BACKGROUND OF THE INVENTION 

Single-instruction multiple-data (SIMD) proces- 
sors are characterized by having an array of pro- 
cessors that perform the same operation simulta- 
neously on every element of a data array. Vector 
processing, an application of SIMD processors, 
uses vector instructions, which specify the opera- 
tion to be performed and specify the list of 
operands, i.e., the data vector, on which It will 
operate. 

The use of processor arrays and vector pro- 
cessing can result in extensive parallelism, result- 
ing in high execution speeds. Yet, despite impres- 
sive execution speeds, getting data in and out of 
the processor can be a problem. Execution speeds 
are less useful if input/output speeds cannot keep 
up. 

In many applications, such as video process- 
ing, real-time processing speed is desirable. Yet, a 
stumbling block to real-time processing is the large 
amount of data that must be processed to generate 
the pixels, lines, and frames of a video picture. 

A need exists for an easily manufactured SIMD 
processor that maximizes data input rates without 
increasing manufacturing costs. Although the need 
for such processors is not limited to television, 
digital television processing involves processing 
tasks, such as scan rate conversion and various 
filtering processes, for which a processor with a 
fast throughput is desirable. 

With regard to scan rates, television relies on 
the concept that a picture can be broken down into 
a mosaic suitable for transmitting and then re- 
assembled to produce a television picture. This 
process is accomplished with linear scanning. The 
television picture is scanned in a sequential series 
of horizontal lines, at both the transmitting and 
receiving end of the television system. 

Various geopolitical regions have different 
scanning standards. The United States uses the 
National Television Systems Committee (NTSC) 
standard. Each picture, i.e., frame, has 525 lines. 
These lines are interlaced to make two fields hav- 
ing 262.5 lines each, and each field is scanned at a 
rate of 60 fields per second. Some countries use a 
Phase Alternate Line (PAL) system, which has simi- 
lar characteristics. Other countries use a Sequential 
Color and Memory (SECAM) system, in which 625 
lines make up a frame. The lines are interlaced to 



make two fields having 312.5 lines each, and each 
field is scanned at a rate of 50 fields per second. 

If a scan rate is too slow, the viewer will notice 
a large area flicker. The 60 Hz and 50 Hz standard 

5 scan rates are intended to exceed a rate at which 
flicker is annoyingly noticeable, but not place ex- 
pensive technological demands on the receiving 
system. Nevertheless, faster scan rates are desir- 
able for improved viewing. 

10 In addition to scan rates, another factor in 

picture quality is the number of lines per frame, 
i.e., vertical resolution. If there are too few lines, the 
distinction between each line is perceptible. Like 
scan rate standards, the selection of a standard 

75 number of lines per field is intended to surpass the 
viewer's annoyance level without unduly burden- 
some technological costs. Yet, like faster scan 
rates, higher line per field ratios are desirable for 
improved viewing. 

20 Recent developments in television systems in- 

clude digital processing within the receiver to con- 
vert scan characteristics, such as scan rates and 
lines per field. Yet, existing digital receivers pro- 
cess data serially, and because of processing 

25 throughput limitations of serial systems, the pixel 
resolution is limited. A need exists for a television 
receiving system that receives an incoming signal 
with one set of scan characteristics and generates 
a picture with different scan characteristics. The 

30 processing throughput should not unduly constrain 
the level of pixel resolution. 

Another television application of digital pro- 
cessing devices is digital filters. For example, digi- 
tal comb filtering is used to separate the luminance 

35 and chrominance signals from each other. In gen- 
eral, digital filters are expressed as z-transform 
functions, in which the terms represent weighted 
time delays. 

A problem with existing digital filtering tech- 

40 niques is that calculations are performed with serial 
processing algorithms and devices, sample-by- 
sample and tap-by-tap. Yet, newer filter applica- 
tions require more processing power than is avail- 
able with these techniques. Some approaches to 

45 digital filtering have improved processing speed 
with custom designed circuits, but this approach 
sacrifices programming flexibility. As a result, sys- 
tem development is slow and unsophisticated. A 
need exists for a digital filter that not only achieves 

50 a fast throughput, but is also easily adapted to 
different filter algorithms. 

SUMMARY OF THE INVENTION 

55 One device described herein is a data input 

system for a single-instruction multiple-data pro- 
cessor having computational elements for process- 
ing incoming data samples. A data input register 
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has an input element corresponding to each of the 
incoming data samples, and receives sets of data 
samples from a number of data input channels in 
parallel. A control circuit in communication with the 
data input register via the data input channels 
provides the data samples on the data channels so 
that the set of data samples is stored in adjacent 
input elements in parallel. 

A technical advantage of the input control unit 
is that it provides a means for increasing the input 
rate of a single instruction, multiple data processor. 
Using an input register divided into n blocks 
achieves an input rate that is n times as fast as an 
unblocked processor. 

Another device described herein is a digital 
processing system, used in a television receiver, 
for converting the scan characteristics of an incom- 
ing television signal. The system has a pair of 
luminance field memories, one for odd fields and 
one for even fields, for storing data samples of said 
incoming signal, and has a pair of chrominance 
field memories, one for odd fields and one for even 
fields, for storing data samples of said incoming 
signal. A pair of luminance single-instruction 
multiple-data processors implement a vertical filter 
process for changing the number of lines per field 
of said signal, such that one of said processors is 
in communication with the luminance field memory 
containing odd fields and operates on samples of 
said odd fields and one of the processors is in 
communication with the luminance field memory 
containing even fields and operates on samples of 
said even fields. A pair of chrominance single- 
instruction multiple-data processors has a similar 
configuration and operation. Each processor is as- 
sociated with an instruction generator and is pro- 
grammed to carry out vertical filtering operations to 
change the number of lines per field. A processor 
control unit provides control and timing signals. 

A technical advantage of the scan converter 
unit is that the same processor may be used to 
improve the vertical resolution of an incoming tele- 
vision signal, as well as to provide a different scan 
rate. 

A third device described herein is a program- 
mable digital filtering unit. Data samples are loaded 
to a single instruction, multiple data processor hav- 
ing a number of processing elements, where every 
sample is received into a corresponding processing 
element. The processing elements correspond to 
taps of a filter function. An arithmetic unit asso- 
ciated with each processing element and near- 
neighbor communications between processing ele- 
ments are used to perform the computations of the 
filter function. Processed sample values are output 
at the corresponding processing elements so that 
the data remains ordered in a scan line. 

A technical advantage of the invention is that 



input, filter calculations, and output are performed 
as parallel operations, which achieves fast execu- 
tion times. Also, the processor is programmable, 
which permits faster development times for the 
5 filter algorithms. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a block diagram of a representative 
io single-instruction multiple-data processor. 

Figure 2 is a block diagram of a processing 
element of the processor of Figure 1 . 

Rgure 3 is a timing diagram of the processor 
of Rgure 1 . 

75 Rgure 4 illustrates near-neighbor communica- 

tions among processing elements of the processor 
of Rgure 1 . 

Rgure 5A is a block diagram of a digital pro- 
cessing unit that includes the processor of Figure 
20 1. 

Rgure 5B is a block diagram of a typical 
television receiving system that includes the ^digital 
processing unit of Rgure 5A. 

Rgure 6 illustrates the process of transferring 
25 data samples of an incoming signal to a blocked 
input register of a single instruction multiple data 
processor. 

Rgure 7 illustrates the control circuit used in 
the process of Figure 6. 
30 Rgure 8 is a timing diagram of the process of 

Figure 6. 

Rgure 9 is a block diagram of a digital pro- 
cessing unit for converting scan characteristics of 
an incoming television signal. 
35 Rgure 10 illustrates one of the field memories 

of the digital processing unit of Rgure 9. 

Rgures 11 A - 11 B illustrate a vertical filtering 
process used for scan conversion. 

Rgure 12 illustrates the process of using a 
40 single-instruction multiple-data processor to imple- 
ment a digital filter. 

Rgure 13 illustrates the use of register files of 
a single-instruction multiple data processor to pro- 
vide a line memory rotation. 
45 Rgure 14 illustrates a television demodulation 

process using a horizontal digital filter. 

Rgure 15 illustrates the relationship between 
an incoming signal and processor elements of a 
single-instruction multiple-data processor used for 
so horizontal digital filtering. 

Rgure 16 illustrates a first method of using a 
single-instruction multiple-data processor to imple- 
ment a five-tap horizontal filter. 

Rgure 17 illustrates a second method of using 
55 a single-instruction multiple-data processor to im- 
plement a five-tap horizontal filter. 

Rgure 18 illustrates a method of using a 
single-instruction multiple-data processor to imple- 
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ment a three-tap horizontal filter. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS : 

Related Applications 

This application is related to United States pat- 
ent applications Serial No. 119,890 (TI-13116), filed 
November 13, 1987; Serial No. 435,862 (Tl- 
13116A); Serial No. 119,889 (TM3117); Serial No. 
256,150 (TI-13117A). filed November 13, 1987; 
Serial No. 323,045 (TI-13117B), and Serial No. 
402,975 (TM3117C). These applications have a 
corresponding European Patent Application No. 0 
317 218, filed November 11, 1988, and published 



This application is also related to U.S. Serial 
No. 421,499 (TI-13496), which was filed in the 
United States on October 13, 1989. These applica- 
tions are assigned to Applicant's assignee and the 
contents of these applications are hereby incor- 
porated herein by reference. 

Serial Video Processor 

Figure 1 illustrates an example of a serial video 
processor (SVP) 10, which may also be described 
as a synchronous vector processor (also SVP). The 
SVP 10 of Figure 1 is the subject of the copending 
patent applications cited above. Subsequent sec- 
tions of this application are directed to devices and 
processes that use SVP 10. However, these de- 
vices and processes are not necessarily limited to 
use with this particular SVP 10, and variations of 
SVP 10 may be used. 

The "serial video" aspects of SVP 10 derive 
from the fact that it is particularly suited for video 
processing, where discrete packets of incoming 
data, which have a uniform size, are input and 
output in a word-serial manner but are processed 
in parallel. The "synchronous vector" aspects of 
SVP 10 derive from the fact that it receives and 
processes data vectors in synchronization with a 
real time data source. Essentially, SVP 10 operates 
by using fine-grained parallelism techniques in 
which many processing elements operate on the 
data concurrently. 

SVP 10 is a general purpose, mask-program- 
mable, single instruction multiple data (SIMD), re- 
duced instruction set computing (RISC) device. 
Consistent with the SIMD characteristic, SVP 10 
has a number of processing elements (PE's), which 
execute the same instruction at the same time. 
External microinstructions control primitive logic 
and arithmetic functions for each clock cycle. 

Referring to Figures 1 and 2, SVP 10 is a one- 
dimensional array of one-bit PE's 20. Each PE 20 



has the following basic components: a data input 
register (DIR) 11, two independently addressed 
register files (R0 and R1) 12 and 15, a set of 
working registers (WR's) 13, a one bit arithmetic 

5 unit (ALU) 14, and a data output register (DOR) 16. 
These are described briefly in this section, and 
reference to the related patents cited above will 
provide further description, especially with regard 
to instructions and timing. 

10 DIR 1 1 can be thought of as the "input layer". 

R0 12 and R1 15, the WR's 13, and the ALU 14 
are the "computational layer". DOR 16 is the 
"output layer". Although each layer may be in- 
dependently clocked across each layer, all PE's 20 

15 operate in unison, every clock cycle. The input to 
DIR 1 1 is word-serial in the sense that words of an 
incoming packet of data are received into DIR 11 
word by word. Similarly, the output from DIR 16 is 
word-serial. 

20 Although input and output are word-serial, pro- 

cessing of each data packet is parallel. Also, be- 
cause of the "layered" approach to processing, 
data input, computation, and data output may be 
concurrent operations, with each being indepen- 

25 dently clocked. Each PE 20 performs these oper- 
ations on an entire vector of data at once, and is 
thus a "pipeline" that enables several operations to 
be in various stages at once. When a vector in- 
struction is executed, the elements of the vector 

30 are fed into the appropriate pipeline one at a time, 
delayed by the time it takes to complete one stage 
of the pipeline. Input and output are in synchroniza- 
tion with the data source, such as a video camera, 
and with the data sink, such as a raster scan 

35 display. 

For purposes of illustration, SVP 10 has N 
number of PE's 20, where N = 1440. The memory 
size is 256 bits for each PE 20, with 128 bits each 
for R0 and R1, DIR 11 is 40 bits wide and DOR 16 

40 is 24 bits wide. These sizes are discretionary, 
however, and may be changed without changing 
the substance of the invention. The input and out- 
put bit sizes are included in Figures 1 and 2 to 
illustrate various input/output and device size rela- 
ys tionships. However, these bit sizes may be varied 
according to the application. 

Using these values, a single SVP 10 can pro- 
cess data packets of 1 to 1440 words by 40 bits. 
Typically, the packets are equal in size and repre- 

so sent periodically recurring data, such as lines of a 
television image, where each packet is digitized 
into N number of data samples, and where each 
sample, SO), i = 1...N, is a data word used to 
generate an output word. In television applications, 

55 where SVP 10 has N PE's 20, N also represents 
the number of data samples per line. 

Figure 2 illustrates a single PE 20(i) and its 
associated components, where i = 1...1440. A ver- 
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tical slice through SVP 10 of Figure 1 yields an 
individual PE 20 of Figure 2, thus each PE 20(i) 
and its components are referred to herein as a 
"column" with respect to the entire array of SVP 
10. 

DIR 11 and DOR 16 are the basic I/O devices 
of SVP 10. Both DIR 11 and DOR 16 are arrays of 
sequentially addressed, duaJ-ported memory cells. 
As used in this description, "DIR 11" refers to the 
entire array, and "DIR 11(i) n refers to the column of 
DIR 1 1 that receives data sample S(i). 

Referring to both Figures 1 and 2, the input 
array size to SVP 10 permitted by DIR 11 is 1440 
words x 40 bits. One port of DIR 1 1 is organized as 
1440 words of 40 bits each and permits DIR 11 to 
be written into from a 40 bit input line in parallel. 
Thus, this first port of DIR 11 emulates the write 
port of a 1440-word line memory, which permits 
word-serial input. The second port of DIR 11 is 
organized as 40 words of 1440 bits each, where 
each bit corresponds to a PE 20(i). This second 
port provides an interface between DIR 11 and 
PE*s 20. It is physically a part of, and is mapped 
into, the absolute address space of R0 12. This 
permits the contents of DIR 11 to be addressed for 
selection to write into memory and is read in par- 
allel. 

Like DIR 11, DOR 16 is a two port device. In a 
manner similar to DIR 11, it provides 1-bit access 
to each ALU 14(i) and 24-bit output from SVP 10. 
One port of DOR 16 is organized as 1440 words of 
24 bits each. This port functionally emulates the 
read port of a 1440-word line memory and is used 
for word-serial output The second port of DOR 16 
is organized as 24 words of 1440 bits each, where 
each bit corresponds to a PE(i). This second port 
couples to R1 15, and is written to in parallel. 

The write and read control signals to DIR 11 
and from DOR 16 are explained in detail in subse- 
quent sections of this application, but in general, 
DIR 11 and DOR 16 each have a 1440-bit word 
selection commutator, which controls loading to 
and reading from DIR 11 and DOR 16, respec- 
tively. Also, DIR 11 and DOR 16 each have an 
enable and a reset signal. 

The data inputs to DIR 11 are controlled by the 
signals Write Enable (WE), Reset Write (RSTW), 
and Serial Write Clock (SWCK). WE controls both 
the write function and the address pointer incre- 
ment function synchronously with SWCK, which is 
the data sample clock input. When high, RSTW 
resets the address pointer to the first word in DIR 
11 on the next rising edge of SWCK. The control 
signals for DOR 16 are Read Enable (RE), Reset 
Read (RSTR), and Serial Read Clock (SRCK), 
which operate in an analogous manner. 

R0 12 and R1 15 each have 128 words by 1 bit 
of read/write memory per PE 20. Different address- 



ing structures cover the RO 12 and R1 15. How- 
ever, R0 12 and R1 15 share the same control and 
timing circuitry. R0 12 and R1 15 are comprised of 
random access memory (RAM) cells. If dynamic 

s RAM cells are used, they must be refreshed, but 
typical digital television applications perform the 
refresh by operating in a faster cycle time than the 
required refresh period. 

Each R0 12(f) and R1 15(i) is independently 

io addressable, and is capable of 1-bit read-modify- 
write cycle such that it can be read, the data 
operated on by ALU 14, and the result written back 
to it in a single clock cycle. R0 12 and R1 15 read 
data at the same time, but write separately. 

75 The working register (WR) set 13(i) for each 

PE 20(i) comprises four registers: M, A, B, and C. 
These registers are the same, except for their data 
sources and destinations. Each WR 13(i) is asso- 
ciated with an input multiplexer for providing data 

20 to the four inputs of each ALU 14(i). The M register 
is used for division, multiplication, and logical and 
conditional operations. Registers A, B, and- C are 
addend, minuend, and carry/borrow registers, re- 
spectively. 

25 ALU 14 is a simple full adder/subtracter and a 

one-bit multiplier. The inputs to ALU 14 are from 
the WR's 13. These ALUs carry out whatever in- 
struction is specified by the control unit of SVP 10. 
A feature of SVP 10 is that each ALU 14 executes 

30 instructions from a set of instructions that operate 
on data directly. A control unit, which feeds an 
instruction stream to SVP 10 has an additional set 
of instructions that provide basic execution control. 
The control unit is further described below in con- 

35 nection with Figure 5. 

Figure 3 is a timing diagram of a single cycle 
of SVP 10. A processing clock (PCLK) is one of 
three clocks of SVP 10, where each clock cor- 
responds to an input, computational, or output lay- 

40 er. Although the clocks are asynchronous to permit 
concurrent operations of these three layers, the 
input and output clocks stop to permit data trans- 
fers into and out of the computational layer. 

In Figure 3, one PCLK cycle, N, has a period 

45 T. The labeled timing points indicate interlocked 
edges, where NCGATE and PCGATE are control 
signals for sense amplifiers (not shown) and YSEL 
0/1 indicates a select signal for R0 12 or R1 15. 
The sense amplifiers amplify and control the BIT- 

so LINES for R0 12 and R1 transfers. To achieve 
single-cycle, 1440-bit, parallel computations, data 
transfers between R0 12, R1 15, and ALU 14 are 
precisely timed. Each such data transfer is held off 
by a computation interlock circuit until the end of 

55 computation is indicated. This technique achieves a 
fast memory/processor data transfer rate. 

Figure 4 illustrates the near neighbor commu- 
nications among PE's 20. A left/right (L/R) bus 41 
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provides direct memory and register read/write 
from each PE 20 to its four nearest neighbor PE*s 
20, i.e., the two PBs 20 to the left and the two 
PETs 20 to the right To accomplish such commu- 
nication, each PE 20 generates one output, which 
is fanned out to its four neighbor PE's 20. This 
output may be from any one of four sources: a 
logical 0, the contents of the B register of WR 13, 
or a location from either R0 12 or R1 15. Each PE 
20 also receives four signals, one from each of its 
four nearest neighbors. 

As wili be explained below, many digital signal 
processing tasks involve the use of filter algorithms 
to remove unwanted signal artifacts. The LYR com- 
munications of Figure 4 are especially useful for 
multi-tap FIR filters, which can be factored into five 
or fewer taps. 

SVP Video Applications 

As indicated above, SVP 10 is especially useful 
for digital video processing. Each signal segment 
that represents a horizontal line of an incoming 
television signal is digitized as a data "packet" 
comprised of a data samples. SVP 10 loads, pro- 
cesses, and outputs data for each pixel on a hori- 
zontal line in parallel. The architecture of SVP 10 
permits data vectors from multiple pixels, multiple 
lines, or multiple fields to be processed in parallel, 
and hence SVP 10 is capable of the "three dimen- 
sional processing" required for digital television. 

A particular advantage of using SVP's 10 is 
that discrete line memories are not required. Line- 
by-line storage is emulated in the processing of 
SVP 10, using a software procedure, referred to as 
"global rotation". This procedure is explained in the 
above-cited U.S. patent application, Serial No. 
421,499 and in connection with Figure 13 below. 

Figure 5A illustrates a basic processor system 
50a having a single SVP 10. The television receiver 
circuitry surrounding processor system 50a is de- 
scribed in connection with Figure 5B, which also 
illustrates data inputs to SVP 10. In contrast, Figure 
5A illustrates the control, address, and instruction 
inputs to SVP 10, and may be supplemented with 
the description of the same circuits in the above- 
cited U.S. patent application. Serial No. 421,499. 

Referring now to Rgure 5A, the basic compo- 
nents of processor system 50a are SVP 10, an 
SVP control unit 51, and an instruction generator 
52. The use of one SVP 10 versus more than one 
SVP 10 is dependent on the complexity of the 
processing tasks and hence on the execution time. 
For full-screen real-time video processing, the op- 
erations performed on a line of picture data must 
be executed in a single 1 H period, where H repre- 
sents the period of one horizontal scan line. How- 
ever, if 1H is not enough time, more than one SVP 



10 may be interconnected and processing tasks 
partitioned among them. 

Each SVP 10 need not have the exact configu- 
ration of Figures 1 and 2. As already stated, the 
s distinguishing characteristics of an SVP 10 is the 
ability to process a data packet representing a data 
packet consisting of an entire line of a television 
picture in parallel, using a processing element for 
each pixel. 

10 An input control unit 54a, may perform more 

than one type of input control, depending on the 
types of tasks to be performed. For loading DIR 11, 
control circuit 10a includes a means for controlling 
the WE signal, which is triggered to begin at the 

15 end of a horizontal blanking period and clocked so 
that all columns of DIR 11 are loaded during one 
horizontal scan period. Input control unit 54a also 
controls what type of data is received into SVP 10. 
A particular type of input control circuit, especially 

20 designed for high data input rate, is described in 
the next section of this application. An output con- 
trol unit 54b may be configured using similar tech- 
niques. 

SVP control unit 51 has several components: 

25 controller 51a, vertical timing generator 51b, hori- 
zontal timing generator 51c, and constant generator 
51 d. Ideally, each of these devices is program- 
mable and accesses its own program store mem- 
ory. In Rgure 5A, each of these components has 

30 its own read only memory (ROM). To facilitate 
development of processing tasks, programs may 
be developed on a host system (not shown) and 
downloaded to each ROM, using standard interface 
techniques. A host interface 53 may be for either 

35 parallel or serial data transfers, for example an RS- 
232C interface. 

In operation, SVP control unit 51 generates 
control signals for SVP 10, which are synchronized 
with the vertical synchronization signal and the 

40 horizontal synchronization signal of the incoming 
television transmission. These control signals in- 
clude operating constants, instructions, and timing 
signals. As an overview of the timing operation of 
SVP control unit 51, controller 51a controls the 

45 video signal processing at a field or frame rate, 
vertical timing generator 51b controls processing at 
a line rate, and horizontal timing generator 51c 
controls processing at a pixel rate. 

SVP control unit 51 also provides timing and 

so control signals to other system components, such 
as for horizontal and vertical synchronization. 
These latter timing signals are "external" in the 
sense that they do not control processor system 
50a Instead they control devices such as field 

55 memories, as described in subsequent sections of 
this application. 

Controller 51a receives and interprets external 
commands from a main television receiver control 
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unit (shown in Figure 5B). It generates a series of 
control codes to vertical timing generator 51b and 
horizontal timing generator 51c. Controller 51a is 
programmable with a variety of instructions, includ- 
ing conditional and vectored jumps. 

Vertical timing generator 51b provides control 
codes to horizontal timing generator 51c, constant 
generator 51 d, and instruction generator 52. It pro- 
vides timing to external circuits requiring a timing 
resolution of one horizontal line. 

Horizontal timing generator 51c generates tim- 
ing signals for circuits requiring timing edges at 
sample clock rates, such as DIR 11, DOR 16, field 
memories, and A/D and D/A converters (shown in 
Figure 5B). It is capable of producing timing edges 
with a resolution as small as one sample clock. 

Similarly, constant generator 51 d provides con- 
stant values to individual PE's 20. There are two 
main reasons for using such constants. First, it is 
possible to map waveforms onto the PE's 20. Sec- 
ond, local constants distinguish the I chrominance 
signal from the Q signal, and permit the PE f s 20 to 
multiplex and demultiplex the chrominance signal 
and to modify algorithms in the horizontal direction 
when merging two images. 

Instruction generator 52 receives algorithm 
specifier codes from vertical timing generator 51b 
and condition flags from horizontal timing generator 
51c. It outputs microinstructions to ALU 14, and 
addresses for RO 12 and R1 15. Also, instruction 
generator 52 provides basic execution control 
instructions, such as for jumps, calls and returns, 
test flags, and global rotation. Instruction generator 
52 is associated with program storage, such as a 
ROM, to which instructions may be downloaded 
from a host system (not shown). 

The various digital television processing tasks 
performed by processor system 50a may include 
scan conversion, motion detection, luminance and 
chrominance signal processing, and interpolation 
and decimation. Many of these tasks involve the 
use of filter algorithms to remove unwanted signal 
artifacts. Special configurations and programming 
for scan conversion and filtering are explained in 
subsequent sections of this application. 

Figure 5B is a block diagram of the basic 
components of a television receiving system, which 
includes processor system 50a. More specifically, 
processor system 50a is part of a digital unit 50b, 
which also includes field memory 56. For purposes 
of providing a general idea of a receiver that is not 
specific to composite or component television sys- 
tems, Figure 5B does not differentiate between 
composite and component processing, which are 
two well known alternate approaches to digital tele- 
vision receivers systems. Instead, Figure 5B simply 
indicates that the signals are digitized and sepa- 
rated before input into digital unit 50b. 



At the front end of the system, a video signal 
from an antenna or other source is detected in the 
usual manner through standard RF/IF unit 55a, 
producing an analog video signal Va. 
5 Separation and analog to digital (A/D) unit 55b 

performs whatever demodulation or separation is 
required for the particular signal being used and 
converts the signal to digital sample data. This 
data, in digital form, as referred to herein as the 
io "signal" due to the fact that it represents a continu- 
ous incoming picture signal. Although word sizes 
and sampling rates may vary, for purposes of ex- 
ample herein, the sampling frequency is 4 fsc for 
luminance signals and 1 fsc for chrominance sig- 
75 nals, where fsc is the color subcarrier frequency. 

For every pixel to be displayed, this conversion 
produces three parallel inputs to DIR 11 of SVP 10, 
i.e., a luminance sample and two chrominance 
samples. With a 40-bit DIR 11, each pixel value 
20 may be represented by a total of 40 bits. Typically, 
each sample is an 8-bit word, thus each pixel is 
derived from at least three 8-bit words. ^ . 

Digital unit 50b has a processor system 50a 
and field memory 56. Field memory 56 is simply a 
25 standard first in, first out memory for, storing fields 
of video data. Field memory 56 is actually com- 
prised of a number of field memories <5.6(i), which 
provide digital unit 50b with the field-delayed data 
used for various processing tasks, especially tem- 
30 poral filtering. Each of these field memories 56(i) 
may be any one of a number of well known storage 
devices, such as the TMS4C1060, manufactured 
by Texas Instruments, Inc. Held memory 56 may 
be a bank of DRAM's, or because random access 
35 is not necessary, may merely provide serial input 
and output. Depending on the algorithms per- 
formed by ALU 14, field memory 56 may be part of 
a feedback path to SVP 10, or it may simply 
provide pre-processing or post-processing storage. 
40 A main receiver control unit 58 receives exter- 

nal signals, such as those from a key pad, remote 
control, or video decoder. It decodes these signals 
and transmits them to other receiver components, 
such as SVP control unit 51. 
45 From digital unit 50b, the processed video data 

signal is output in parallel, as 8-bit words to D/A 
unit 57a. The resulting signals from D/A unit 57a 
are the same analog signals that would be received 
by display unit 57b if processor system 50 were 
so not included. Thus, digital unit 50b is simply inter- 
posed in the signal path at the output of a conven- 
tion television receiver RF/IF unit 55a. 

Display unit 57b is a standard unit for convert- 
ing the processed signals into red, green, and blue 
55 signals. This is accomplished by the usual matrix 
techniques. 

Display 57c receives the analog video signal 
from display unit 57. Typically, display 57c is of a 
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raster scan type, such as a cathode ray tube. 
However, the invention could be used with any 
type of display having appropriate adapter circuits 
to use the signal generated by SVP 10. For exam- 
ple, display 57c could be used with a display 
memory (not shown) that receives the signal from 
processor system 50a and outputs all pixel ele- 
ments in parallel. 

Data Input System 

Figures 6-8 and the explanation in this section 
are directed to a interleaving a single-instruction 
multiple-data processor to increase its data input 
rate. For purposes of explanation, the input data is 
a digitized television signal. However, the device 
and process described herein is not limited to 
television signals, and any series of data words 
may be substituted for the television signal. 

Figure 6 illustrates how a single-instruction 
multiple-data processor, such as SVP 10 of Figures 
1 and 2, is configured to receive a television signal 
in this interleaved manner. SVP 10 is shown in 
relation to a luminance signal, Ya, which as stated 
above, is sampled at a regular sampling rate to 
produce data samples. S(n). Although Figure 6 
shows only luminance processing, the chrominance 
signal is handled in a similar manner. 

Although Figure 6 does not explicitly show 
signal separation and digitization, implicit in Figure 
6 is the sampling of signal Ya into data samples, 
each sample comprising an n-bit word. In the ex- 
ample of Figure 6, each sample is 8 bits. Also not 
shown, but implicit in Figure 6, is a data buffer or 
some other temporary storage of the incoming 
signal. 

For purposes of receiving input control signals, 
SVP 10 is divided into four blocks 61a - 61 d, each 
having an equal number of PE's 20. The division of 
SVP 10 into four blocks rather than some other 
number of blocks is for purposes of example, and 
the number of blocks may be varied for different 
applications. Figure 6 illustrates these blocks with 
both a non-interlaced and an interlaced representa- 
tion. 

Each block has S/n PE's 20, where S is the 
number of data samples per packet and n is the 
number of blocks. For example, if a luminance 
signal has 1440 data samples per line, SVP 10 
might have four blocks, each block having 360 
PE's 20. 

During a single SWCK period, four data words 
from four input channels 62a - 62d are written to 
DIR 11 in parallel. Each block 61a - 61 d receives 
one word. During a first time interval, sample S(n) 
is received into block 61a, sample S(n + 1) into 
block 61b, sample S(n + 2) into block 61c, and 
sample S(n + 3) into block 61 d. During a next time 



interval, the next sample would be received into 
block 61a, and so on. Thus, for each time interval, 
DIR 1 1 receives four words rather than one. 

As indicated in the interleaved representation 

5 of SVP 10 in Figure 6, blocks 61a - 61 d are not 
comprised of adjacent PE columns. In fact, the 
blocks are "virtual" in the sense that each block is 
defined by being associated with one of four data 
input lines. In other words, as explained below, 

io input control unit 10a interleaves the blocks. As a 
result, each set of four adjacent PPs 20 contains a 
PE 20 from each block. Each such set of four PE's 
20 is referred to herein as a PE sub-block. Be- 
cause of the near-neighbor communication among 

is PE's 20, PE 20(n) may access both PE 20(n + 1> 
and PE 20(n-1), and thereby process adjacent 
samples as is required for most practical applica- 
tions. 

Figure 7 is a block diagram of input control unit 

20 54a configured for the interleaved SVP 10 of Figure 
6. The basic components of input control unit 54a 
are channel selector 71 and commutator 72. 

Channel selector 71 selects one of four data 
channels 62a - 62d for delivery of data samples to 

25 DIR 1 1 . Each channel delivers an 8-bit sample, and 
the four samples thus delivered are referred to 
herein as a "set" of data samples. The DIR 11(i) 
columns of PE's 20(n modulo 4) are connected to a 
first channel 62a, the DIR 11(i) cells of PE's 20- 

30 (n + 1 modulo 4) are connected to a second chan- 
nel 62b, etc. 

Commutator 72 controls the write enable (WE) 
lines to each PE sub-block. Each commutator cell 
72(i) is itself enabled by a clock signal (SWCK). 

35 For purposes of example, the working frequency of 
SVP 10, and thus the enable frequency of com- 
mutator cells 72(i) is 27 MHz. The number of 
commutator cells 72(i) is N/n, where N is the num- 
ber of processing elements and n is the number of 

40 channels 62a - 62d. 

Figures 6 and 7 are best understood with refer- 
ence to Figure 8, which is a timing diagram. Com- 
mutator cell 72(1) is activated to enable four words 
to be written into blocks 61a - 61b. These four 

45 words are available from four input channels 62a - 
62d. One word is delivered to each PE 20 of a PE 
sub-block. When all four words are loaded to the 
PE sub-block, the data is latched. Then, the pro- 
cess is repeated for the next commutator element 

so 72(2) and the next four words. The result of the 
interleaving and the special configuration of input 
control unit 54a is that data is now read into SVP 
10 at a rate of 108 MHz rather than 27 MHz. 

Although the above description is applied to 

55 DIR 11, the same techniques are applicable to 
DOR 16. In other words, to increase the output 
frequency of SVP 10 by a factor of n, DOR 16 
could be divided into n channels and output control 
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circuit 54b configured to read out data at a desired 
rate and in the proper sequence. 

Television Scan Rate Converter 

Figure 9 illustrates an application of the input 
control unit 54a of Figures 6-8, a digital scan con- 
verter unit 90. Converter unit 90 is part of a televi- 
sion receiving system, such as shown in Figure 5B. 
It converts an incoming television signal produced 
at one scan rate into a signal that results in a 
television picture at a different scan rate. The sys- 
tem of Figure 9 may be interposed in the system 
of Figure 5B, or may replace the SVP 10 of Figure 
5B, with appropriate inputs. 

For purposes of example, the following de- 
scription is in terms of converting a 1250-line inter- 
laced 50 Hz scan rate signal into a 900 line inter- 
laced 100 Hz scan rate signal. It should be under- 
stood, however, that scan converter 90 is not limit- 
ed to these conversion values, and may be easily 
modified to accommodate conversion of other verti- 
cal resolution and scan rate values. 

To achieve the 1250/2:1/50 Hz to 900/2:1/100 
Hz conversion, two problems must be solved. First, 
the number of lines must be converted in the 
proper ratio. Second, if the frequency is different, 
the field rate must be converted. The conversion 
process must produce three lines for every four 
input lines and produce four fields for every two 
input fields. 

The input to scan converter unit 90 is data 
words representing luminance and chrominance 
signals from an appropriate conversion and separa- 
tion unit 55b. For purposes of example, it is as- 
sumed that the input rate to converter unit 90 is 54 
MHz. 

The Yd and Cd samples are loaded into field 
memory 56, with separate field memories, 56(Y) 
and 56(C), for luminance and chrominance signals. 
Furthermore, each field memory 56(Y) and 56(C) 
has an odd field memory and an even field mem- 
ory 56. For each type of signal, Yd and Cd, odd 
and even numbered samples are loaded to a cor- 
responding odd or even field memory 56, which 
may be field memory 56(Y,odd), 56(Y,even), 56- 
(C.odd), or 56(C,even). 

An underlying assumption of scan converter 90 
is that the scan rate cannot simply be doubled by 
doubling the rate at which data is read from field 
memory 56. This is a reasonable assumption in 
that, under current technology, the maximum 
speed of memory read and write operations is 
typically less than 33 Mhz. Thus, the output rate of 
field memory 56 and the input rate of DIR 11 are 
similarly limited. 

Converter unit 90 solves the limitations of 
memory output rates by configuring each field 



memory 56 to provide n parallel outputs in one 
time interval. These outputs represent n data 
words, which are loaded in parallel to SVP 10. The 
loading is accomplished by configuring and con- 

5 trolling SVP 10 in accordance with the input control 
techniques described in the preceding section of 
this application. 

Referring now to Figure 10, the division of a 
single field memory 56 into four parts for n chan- 

io nels of data is illustrated. The output frequency of 
field memory is the input frequency to input control 
unit 54a. In this example, because of the even/odd 
split of field memory 56, the output frequency is 27 
MHz. Although Figure 10 shows only one field 

75 memory 56, the other field memories 56 are con- 
figured in the same manner. 

Because of the division of field memory 56 into 
n channels, the overall output frequency (OF) of 
field memory 56 is expressed as: 

20 

OF = n * CF 

, where each CF is the frequency of each channel. 
The primary limitation is that CF be less than the 

25 maximum output rate, i.e., 33 MHz in this example. 
For example, if CF = 27 MHz, and there are four 
channels, OF = 108 MHz. This is twice the input 
frequency (IF) of 54 MHz, as is required for dou- 
bling the field rate. 

30 By altering the number of channels or the read 

frequency of field memory 56, other OF values can 
be obtained. In fact, a simple formula can be used 
to determine the required number of channels and 
frequency per channel for a desired ratio of input 

35 field scan rate to output field scan rate: 

input rate/Output rate = (n * CF) / l/F. 

As explained below, if the number of lines per field 

40 is also varied, a decimation or interpolation ratio 
will affect the effective output rate from SVP 10. 

For altering the vertical resolution, SVP's 10 
are used as vertical filters, in that they operate on 
current and previous lines from the same frame. 

45 Each SVP 10 receives the output of a field memory 
56 and performs whatever filtering is desired. The 
filtering function is either decimation or interpola- 
tion, depending on whether the conversion is to a 
smaller or larger number of lines per frame. 

so In general, to implement the filter process, a 

filter function must be obtained, using the desired 
output characteristics. Each input data line repre- 
sents a filter tap. The number of filter coefficients 
depends on the decimation ratio, and the coeffi- 

55 cient values depend on a motion signal. For a five- 
tap filter, the general form of the filter function is: 

yn = L n * xo + Ui * *i + ' x 2 + U3 * x 3 + 
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, where U - L,^ represent five data lines and xo - 
x* represent coefficient values. 

Using the example of converting 1250 lines to 
900 lines, the ratio of the actual lines used 2(576)- 
/2(432) = 4/3, because 2(576) represents the num- 
ber of lines actually used in a 1250 line display. 
Thus, a decimation filter must implement a decima- 
tion ratio of 4:3. For every four input lines, SVP 10 
must generate three output lines. 

Figures 11A - 11B illustrate a vertical filtering 
process that accomplishes a 4:3 decimation. The 
filter is a five-tap filter, and thus uses five data 
lines. The filter function has 3 sets of coefficients, 
sets A, B, and C. The coefficient values for each 
set are ao • a4, bo - b+, and Co -c*, with values 
determined using digital filter design techniques. 

In Figure 11 A, Lines L' n - LV4 are incoming 
lines, where L'„ represents the line that is earliest 
in time. Lines U -U_4 represent the same five data 
lines from the processing point of view. From this 
point of view, the earliest incoming line, L' ni is 
delayed five lines from the current Input line, U. 
The data lines produce an output line, y n . 

In Figure 11B. lines L' n - L* n ^ again represent 
incoming lines but a new input line takes the place 
of the fifth earlier data line. This is, in effect, a 
"rotation" in which the five most recent lines are 
always available for the filter process. For this step, 
the filter function uses the set B coefficients to 
produce the next y n . 

In Figure 11C. the incoming data lines are 
again rotated so that the five lines most recent in 
time are available for the filter function. This step 
produces a third output line, y n , using the set C 
coefficients. A fourth line would be generated with 
the same function, but using a fourth set of data 
lines and the coefficients of set A 

Figure 12 illustrates the process of using SVP 
10 for vertical filtering. During the first step SVP 10 
receives data representing a horizontal line of an 
incoming television signal. If the scan rate is also 
being converted, the data is input in accordance 
with the scan conversion techniques described 
above, and the SVP 10 of Figure 12 may be the 
same as any SVP 10 of Figure 9. At the same time 
as the data is input to DIR 1 1 , a processed horizon- 
tal line is output from DOR 16. 

In the second step, the contents of DIR 1 1 are 
transferred to R1. In the third step, previously 
stored lines from RO 12 and the new line from R1 
are used for the filter computations, which a per- 
formed by ALU 14. For a five-tap filter, the number 
of previously stored lines used for the computa- 
tions is four. In the fourth step, the processed line 
is transferred to DOR 16. 

The final step uses a global rotation process, in 



which an individual line memory subset of R0 and 
R1 may be circularly rotated rather than shifted 
throughout the memory bank. This global rotation 
process is one of the advantages of using SVP 10, 
5 which eliminates the need for external line memo- 
ries. 

Figure 13 illustrates the global rotation process. 
Five taps from an input signal represent input data 
samples to a single register file of a PE 20, which 

10 may be either R0 12 or R1 15. Each tap is delayed 
by one horizontal time period, thus the taps repre- 
sent samples from corresponding sample positions 
of consecutive lines. Part of the 128 bit memory of 
R0 12 or R1 15 is allocated as global rotation 

15 memory space. For a five-tap filter in which each 
sample is 8 bits, a 40-bit space is used. This 40-bit 
space is configured so that each bit of a sample 
from one line position can be shifted to the cor- 
responding bit positions of the next line's position. 

20 The line spaces are labeled as line space A - E. 

During a global rotation, the first step is to shift 
each 8-bit sample to the next higher-addressed 8- 
bit line space. Then, new sample data is written to 
the first space, i.e., space A. The data that was in 

25 space E may be overwritten because it is no longer 
needed. 

Referring again to Figure 9, the decimation 
process is illustrated in the context of generating a 
picture from an incoming television signal. The 

30 luminance and chrominance signals are processed 
in the same manner at the same time. The follow- 
ing description is directed to luminance signal pro- 
cessing. During one separation and conversion of 
one input field, the previous odd field and even 

35 field, which are stored in filed memory 56(Y,odd) 
and 56(Y,even), are used by SVP 10(1), SVP 10(2), 
and SVP 10(3) to generate an output field. 

Each SVP 10 performs the same filtering op- 
eration, but operates on different data and uses 

40 different filter coefficients. More specifically, SVP 
10(1) and SVP 10(2) are used for still areas of the 
picture where there is no field-to-field motion to 
cause a blurring side effect of filtering. SVP 10(3) 
is used for areas of the picture in which there is 

45 motion. 

A motion detection unit 91 is used to generate 
signals, My and Mc. When motion is detected, My 
or Mc, selects the appropriate output from SVP 10- 
(1) and SVP 10(2) or from SVP 10(3). Various 

so motion detection methods may be used for gen- 
erating My and Mc. 

For still areas of the picture, to generate one 
line of x. SVP 10(1) and SVP 10(2) each use five 
lines from field memories 56(Y f odd) and 56- 

55 (Y,even). SVP 10(1) calculates odd lines as: 

Yodd = Ln.odd * SO + ... + L|v4,odd * 
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. SVP 10(2) calculates even lines as: 

Yeven = Ln,even * ao + ... + L n -4,even * 

5 

At the same time, SVP 10(3) calculates lines 
by using, alternatively, lines from field memory 56- 
(Y,odd) and from field memory 56(Y,even). Thus, 

10 

Yn = Ln{odd/even> * alphao + ... + Ln{odd/even> * alpha^ 



As stated above, whether a still output line or a 75 
motion output line Is used, depends on the state of 
the motion detection signal. This process of gen- 
erating each new line continues until the desired 
number of lines is generated. Using the 4:3 deci- 
mation example, the number of output lines is 432- 20 
(2) = 864. The data for these lines is output from 
DOR 16 at a rate consistent with the scan rate 
conversion ratio. In example used herein, the out- 
put rate is 108(3/4) = 81 MHz. 

25 

Finite Impulse Response Filters 

SVP 10 Is useful for other filtering processes, 
not necessarily limited to television processing. In 
particular, there are many applications in which 30 
horizontal filtering among data samples of a single 
data packet is desirable. The following description 
is directed to using a single-instruction multiple- 
data processor, such as SVP 10, for a horizontal 
finite impulse response (FIR) filter process. 35 

Although for purposes of example, a low pass 
filter is described, the same techniques may be 
used to implement any type of horizontal FIR filter. 
Also, for purposes of illustration, an example of 
filtering the chrominance signal of a PAL transmis- 40 
sion is used. Figure 1 4 illustrates the basic steps of 
the separation and demodulation process, including 
the low pass filtering step. 

Figure 15 illustrates SVP 10 in relation to one 
line of an input signal, Va. For purposes of example 45 
Va has 1024 samples, which are designated as S- 
(i), i = 1...1024. SVP 10 has a corresponding 
number of PE ( s 20(i), i = 1...1024. Figure 15 is 
merely representative of the process and does not 
purport to explicitly illustrate the various compo- 50 
nents of the receiving system front end, such as 
are illustrated in Figure 5B, prior to input into SVP 
10. 

Va is received word-serially into DIR 11. In this 
example, 1024 samples are loaded to DIR 11 dur- 55 
ing each 1H period. Transfers of sample values 
between the register files R0 12 and R1 15 and the 
operations of ALU achieve the sample delays and 



computations of the filter function. Data transfers 
from DIR 11 to memory R0 12 and R1 15 are via 
ALU 14, such as are explained in the above-cited 
patent applications, U.S. Serial No. 435,864 and 
Serial No. 421,499. 

The general concept of using SVP 10 for hori- 
zontal filtering is to correspond filter taps to PE's 
20. The terms of the filter function are realized by 
adding calculated values for each tap to memory in 
near-neighbor processors to the right For each 
output sample, the desired value ends up in the 
appropriate PE 20. 

A first method of using SVP 10 for horizontal 
filtering is illustrated in Figure 16. An example of a 
desired filter function is: 

H(Z) « (1 + Z~ 1 )2 (1 + Z-2) (1 + Z- 1 + Z~ 2 ) / 24 

, where the notation z* n represents a delay of n 
sample values. Many algorithms exists for calculat- 
ing coefficient values to obtain a desired filter out- 
put response. The quotient, 24, ensures , a, unity 
gain and in terms of digital processing, is needed 
because each term of the function increases the 
number of bits. It is obtained by multiplying out the 
filter function, and recognizing that at zero fre- 
quency, the sum of the coefficients is 24. 

Figure 16 is a process diagram, illustrating the 
computational steps of the filter process. However, 
Figure 16 represents computations of only a seg- 
ment of SVP 10, specifically, PE 20(n-2),...20(n)- 
...20(n + 2), where (n-2), (n) and (n + 1) identify PE's 
20 that receive samples of line x having a cor- 
responding sample number. The entire parallel 
computation for filtering an input sample, S(n), from 
line X to result in the output sample, S(n) f . 

At the beginning of the computation for line X, 
S(1) through S(1024) are transferred in parallel 
from DIR 1 1 to the corresponding R1 1 2 for that 
PE 20. Thus, R1 15(n) of PE(n) contains S(n). The 
left neighbor PE(n-1) contains the preceding sam- 
ple, i.e., S(n-1). The right neighbor PE(n + 1) con- 
tains the next sample, i.e., S(n + 1). 

The computations to produce a single filtered 
sample value, begin by adding a first sample to its 
preceding sample. Thus, S(n) is added to S(n-1). 
The parallelization of the computation requires S(n- 
1) to be located in PE 20(n-1). The result of the 
addition is SUM 1(n). The next step is adding SUM 
1(n) to SUM1(n-1) to obtain SUM 2(n). 

Each of the above summing steps involves 
only a one-processor delay. To obtain the two- 
processor delay of the next term of the system 
function, SUM 2(n) is added to SUM 2(n-2) to 
obtain SUM 3(n). 

To complete the process, there are two al- 
ternate methods. In the first method, SUM 3(n-2) is 
transferred into R0 12 (n-1) and then added to 
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SUM 3(n-1) in RO to obtain SUM 4(n). Then SUM 
4{n) is added to SUM 3(n) to obtain SUM 5(n). In 
the second method, which is shown in dotted lines 
in Figure 16, SUM 3(n-2) is added to SUM 3(n-1) to 
obtain SUM 4(n-1). Then, SUM 4(n-1) is added to 
SUM 3(n) to obtain SUM 5(n). 

SUM 5(n) is divided by some predetermined 
constant In this example, the constant is 24. de- 
rived as explained above. 

It should be understood that each PE 20 per- 
forms the filter function simultaneously. For exam- 
ple, in a five-tap filter function, the PEfs 20(n) each 
process a tap and generate the result of the filter in 
parallel. Each PE 20(n) overlaps the next PE 20- 
(n + 1) by four taps. In other words, the filter's data 
is taken relative to each PE 20(n). Four of five 
sample values processed by each PE 20(n) overlap 
the sample values being processed by its neighbor 
PE20(n + 1). 

The filter of Figure 16 requires the filter func- 
tion to be factored into the above-described form. 
Yet, not all filter operations can be factored in this 
manner. The same low pass filter as described 
above can be realized with the following function: 

H(z) = (1 + 3z~ 1 + 5z~ 2 + 6z~ 3 + 5z~* + 3Z" 5 
+ z-S) / 24 

. This function was derived by factoring the pre- 
viously described function into lower order terms. 
The L and 2L near-neighbor communications, as 
shown in Figure 4, can be used to realize delays of 
one and two samples. 

Rgure 17 illustrates the add-multiply calcula- 
tions for this second five-tap filter: 

y(n) = x(n) + 3x(n-1) + 5x(n-2) + 6x(n-3) + 5x(n- 
4) + 3x(n-5) + x(n-6) 

, where x(n)...x(n-6) represent data sample values 
delayed by 0...6 1 H periods. Although only the 
calculatons for PE's 20(9) and 20(10) are arbitrarily 
selected and shown, identical computations for all 
PE's 20(n) are performed simultaneously. 

The above filter function may be efficiently 
realized in four stages, with the following four equa- 
tions: 

y1(n) = x(n) + x(n-1) 
y2(n) = x(n) + x(n-1) 
y3(n) - x(n) + x(n-2) 
y4(n) = x(n) + x(n-1) + x(n-2) 

. These equations contain delays of no more than 
two samples. An operand that is delayed by two 
samples may be accessed via the 2L communica- 
tion input of each PE 20(n), as shown in Rgure 4. 
For PE 20(10), the first stage is: 



x10b(n) = x10a(n) + x10a(n-1) 

, and because of the inherent delay due to the data 
5 structure, x10a(n-1) is the same as x9a(n). Thus, 

x10b(n) = x10a(n) + x10b(n) 

. As indicated in Rgure 17, a value in R0 12(10) is 
w added to the value stored in R0 12(9) of the left 
hand neighbor PE 20, and the values are summed 
into R1 15(10). 

The second stage is the same as the first 
stage, so that: 

75 

x10c(n) = x10b(n) + x9b(n) 

. For this operation, both operands are from R1 15 
because it holds the result of the previous opera- 
20 tion. 

The third stage is similar to the second stage 
except that bits are summed with the accumulator 
from the 2L neighbor. The equation is: 

25 x10d(n) = x10c(n) + x8c(n) 

. This represents the two-sample delay of the third 
equation of the set of four equations. 

The fourth stage requires an intermediate sum 
30 in R0 12 from operands in the 2L and L neighbors. 
This is added to the previously accumulated result 
in R1 15(10). The function implemented in stage 
four is: 

35 x10e(n) = x8d(n) + x9d(n) + x10d(n) 

. This relates directly to the fourth equation. 

To verify the above operations, the values may 
be substituted as: 

40 

y(10) 

= 7 + 3(6) + 5(5) + 6(4) + 5(3) + 3(2) + 1 
= 96 

45 , where x(n), x(n-1), ... x(n-6) are the input values of 
PE 20(4) - PE 20(10). 

As indicated by Rgures 16 and 17, SVP 10 
implements a multi-tap filter without the need to 
impose delays in other parallel signal paths. The 

so center tap of the filter is assumed to be a reference 
point and is associated with the current data. PE's 
20 to the right and left of the center PE 20 are 
associated with older and newer data respectively. 
For example, for a five-tap filter, reaching two PE's 

55 left and 2 PE's right makes up the five taps. No 
delay in the horizontal direction is incurred and the 
output is in phase with the reference input at the 
center of the filter. 
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Figure 18 illustrates another method of using a 
single-instruction multiple-data processor, such as 
SVP 10, to implement a horizontal FIR filter. For 
purposes of simplifying the explanation, an exam- 
ple having only three coefficients is assumed: 5 

H(z) = 1 + 3z~ 1 + z" 2 

. This is a three-tap filter, and as indicated in 
Figure 18, requires only a one-processor delay in 10 
each step. Thus, this method is referred to herein 
as the "one-processor delay method" as opposed 
to the "two-processor delay methods" of Figures 
16 and 17. 

As in the two-processor delay methods, SVP 75 
10 is loaded with a line of sample values, S(1)...S- 
(1024), with the 1024 length sample being used as 
an example only. Figure 18 shows the calculations 
only with respect to three PE ? s 20, i.e., PE 20(n-2), 
PE 20(n-1), and PE 20<n), The registers R0 12 and 20 
R1 1 5 are used as accumulators to hold temporary 
results. 

First, S(n-2) and S(n-1) are added to obtain 
SUM 1(n-1). Then, this sum is added to twice the 
value of S(n-1). The result is SUM 2(n-1). Finally, 25 
SUM 2(n-1) is added to S(n) to obtain SUM 3(n). 
As desired, SUM 3(n) ends up in the accumulator 
of PE 20(n). 

The same process can be easily extended to 
other filters, requiring fewer or more taps. For real- 30 
time applications, the primary limitation as to length 
is processing time. For example, to achieve real- 
time television processing, the algorithm should 
take no longer than the horizontal period for a line, 
i.e., 1H. A particular advantage of the filter de- 35 
scribed herein is that processing unit 50a may be 
used so that programs can be developed on a host 
system and downloaded to control unit 51 and 
instruction generator 52. 

40 

Other Embodiments 

Although the invention has been described with 
reference to specific embodiments, this description 
is not meant to be construed in a limiting sense. 45 
Various modifications of the disclosed embodi- 
ments, as well as alternative embodiments of the 
invention will be apparent to persons skilled in the 
art. It is, therefore, contemplated that the appended 
claims will cover all modifications that fail within the 50 
true scope of the invention. 

Claims 

1. An data input system for a single-instruction 55 
multiple-data processor having computational 
elements for processing incoming data sam- 
ples, comprising: 



a data input register having an input ele- 
ment corresponding to each of said incoming 
data samples; 

a number of data input channels for trans- 
ferring sets of said data samples to said data 
input register, wherein the data samples of 
each of said sets are transferred in parallel; 

a control circuit in communication with 
said data input register via said data input 
channels for receiving said data samples and 
for providing said data samples on said data 
channels such said set of data samples are 
stored in adjacent input elements in parallel. 

2. The television receiving system of Claim 1, 
wherein said data input register is virtually 
divided into a number of interleaved blocks, 
such that each block receives one sample of 
said set of said data samples. 

a The input system of Claim 1, wherein said 
control circuit comprises a commutator;- and 
wherein said commutator provides a write en- 
able signal to a number of said input elements 
corresponding the number of data samples in 
said set. 

4. The input system of Claim 1, wherein said 
control circuit comprises a data line selector 
for receiving said data samples and for provid- 
ing said set of data samples on said data 
channels during said time interval. 

5. A processor system for processing serial data 
samples of an incoming signal, comprising: 

a single-instruction multiple-data processor 
having a number of computational elements in 
near-neighbor communication with each other, 
and having a data input register having an 
input element corresponding to each of said 
incoming data samples; 

a number of data input channels for trans- 
ferring sets of said data samples to said data 
input register, wherein the data samples of 
each of said sets are transferred in parallel; 

a control circuit in communication with 
said data input register via said data input 
channels for receiving said data samples and 
for providing said data samples on said data 
channels such said set of data samples are 
stored in adjacent input elements in parallel. 

6- The processor system of Claim 5, and further 
comprising a processor control unit for gen- 
erating control and timing signals for use by 
said control circuit, and an instruction gener- 
ator for generating instructions for use by said 
processor. 
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7. A television receiving system for processing 
video data, comprising: 

an analog to digital converter for convert- 
ing said signal into data samples; 

circuits for separating luminance and 
chrominance signals; 

a single-instruction multiple-data processor 
having a number of computational elements in 
near-neighbor communication with each other, 
and having a data input register having an 
input element corresponding to each of said 
incoming data samples; 

a number of data input channels for trans- 
ferring sets of said data samples to said data 
input register, wherein the data samples of 
each of said sets are transferred in parallel; 

a control circuit in communication with 
said data input register via said data input 
channels for receiving said data samples and 
for providing said data samples on said data 
channels such said set of data samples are 
stored in adjacent input elements in parallel; 

a processor control unit for generating 
control and timing signals for use by said 
control circuit; 

an instruction generator for generating 
instructions for use by said processor; 

a digital to analog converter for converting 
said processed signals to an analog signal for 
display; 

a display for displaying pixels generated 
from said processed data samples. 

8. The television receiving system of Claim 7, 
wherein said display is a raster scan display. 

9. A method of receiving data samples into a 
single-instruction multiple-data processor hav- 
ing an input register with elements correspond- 
ing to said data samples, comprising the steps 
of: 

providing a set comprised of a number of 
data samples on a corresponding number of 
data channels to said data input registers; 

providing a write enable signal to a cor- 
responding number of elements of said data 
input register during a single time interval; and 

controlling said write enable signal such 
that input elements that receive said set of 
data samples are adjacent input elements. 

10. A digital processing system, used in a televi- 
sion receiver, for converting an incoming tele- 
vision signal having certain scan characteristics 
to a display signal having different scan char- 
acteristics, comprising: 

a pair of luminance field memories, one for 
odd fields and one for even fields, for storing 



data samples of said incoming signal; 

a pair of chrominance field memories, one 
for odd fields and one for even fields, for 
storing data samples of said incoming signal; 

5 a pair of luminance single-instruction 

multiple-data processors for implementing a 
vertical filter process for changing the number 
of lines per field of said signal, wherein one of 
said processors is in communication with said 

10 luminance field memory containing odd fields 

and operates on samples of said odd fields 
and wherein one of said processors is in com- 
munication with said luminance field memory 
containing even fields and operates on sam- 

is pies of said even fields; 

a pair of chrominance single-instruction 
multiple-data processors for implementing a 
vertical filter process for changing the number 
of lines per field of said signal, wherein one of 

20 said processors is in communication with said 

chrominance field memory containing odd 
fields and operates on samples of said odd 
fields and wherein one of said processors is in 
communication with said chrominance field 

25 memory containing even fields and operates 

on samples of said even fields; 

an instruction generator associated with 
each of said processors for providing instruc- 
tions to said processors; and 

30 a processor control unit for providing con- 

trol and timing signals to said processors. 

11- The processing system of Claim 10, and fur- 
ther comprising a luminance motion processor 
35 and a chrominance motion processor for im- 

plementing a filter function that changes the 
number of lines per field when said incoming 
signal indicates motion in the transmitted tele- 
vision picture. 

40 

12. The processing system of Claim 10, wherein 
each of said field memories are comprised of a 
number of channels, said number of channels 
being determined by a desired ratio of input 
45 data rate to output data rate, and further com- 

prising an input control circuit associated with 
each of said processors for accepting parallel 
input from said channels. 

so 13. A television receiving system for receiving an 
incoming video signal and converting its scan 
characteristics, comprising: 

color separation and analog to digital units 
for converting said incoming signal to data 
55 samples representing a digitized luminance 

signal and a digitized chrominance signal; 

a pair of luminance field memories, one for 
odd fields and one for even fields, for storing 
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data samples of said incoming signal; 

a pair of chrominance field memories, one 
for odd fields and one for even fields, for 
storing data samples of said incoming signal; 

a pair of luminance single-instruction 5 
multiple-data processors for implementing a 
vertical filter process for changing the number 
of lines per field of said signal, wherein one of 
said processors is in communication with said 
luminance field memory containing odd fields 10 
and operates on samples of said odd fields 
and wherein one of said processors is in com- 
munication with said luminance field memory 
containing even fields and operates on sam- 
ples of said even fields; 75 

a pair of chrominance single-instruction 
multiple-data processors for implementing a 
vertical filter process for changing the number 
of lines per field of said signal, wherein one of 
said processors is in communication with said 20 
chrominance field memory containing odd 
fields and operates on samples of said odd 
fields and wherein one of said processors is in 
communication with said chrominance field 
memory containing even fields and operates 25 
on samples of said even fields; 

an instruction generator associated with 
each of said processors for providing instruc- 
tions to said processors; and 

a processor control unit for providing con- 30 
trol and timing signals to said processors; 

a digital to analog unit for converting said 
data words to an analog signal for display; 

a display unit in communication with said 
digital to analog unit for generating a picture 35 
signal for display; and 

a display for displaying said picture. 

14. A method of using a single-instruction multiple- 
data processor to convert an incoming televi- 40 
sion signal having certain scanning characteris- 
tics to a display signal having different scan- 
ning characteristic, comprising the steps of: 

storing data samples representing said sig- 
nal in memory; 45 

reading said a sequenced set of said data 
samples from said memory in a single time 
interval, wherein the number of said data sam- 
ples in said set is determined by a desired 
data input rate to said processor; 50 

loading said n data words to an input reg- 
ister of said processor during said time inter- 
val, wherein said input register is controlled so 
that said sequential order is maintained; 

communicating said data words to a com- 55 
putational layer of said processor, while retain- 
ing said sequential order; and 

processing said data samples, using said 



processor. 

15. The method of Claim 14, wherein said pro- 
cessing step includes changing the number of 
lines per field of said incoming signal using a 
vertical filter function. 

16. A digital processing system for implement a 
horizontal digital filter, comprising: 

a single-instruction multiple-data processor 
having a number of processing elements for 
receiving said samples in an ordered se- 
quence, and having an arithmetic unit asso- 
ciated with each processing element for per- 
forming computations, and having next-neigh- 
bor communications between said processing 
elements, wherein said processor performs 
computations to implement said filter by using 
said processing elements as taps of said filter 
function, such that said data samples are out- 
put from said processing elements in an or- 
dered sequence for display after processing; 

a processor control unit for providing tim- 
ing and control signals to said processor; and 

an instruction generator for providing 
instructions to said processor that determine 
execution of said computations. 

17. The digital processing system of Claim 17, and 
further comprising an interface for downloading 
said instructions from a host development sys- 
tem. 

18. A television receiving system for separating 
and demodulating an incoming television sig- 
nal having luminance and chrominance signals, 
comprising: 

a stopband filter for obtaining said lumi- 
nance signal; 

a passband filter for obtaining said 
chrominance signal; 

a pair of demodulator units for obtaining 
two color difference signals from said chromin- 
ance signal; 

a pass of horizontal low pass filters for 
filtering each of said color difference signals, 
wherein each of said horizontal low pass filters 
has a single-instruction multiple-data processor 
having a number of processing elements for 
receiving said samples in an ordered se- 
quence, and having an arithmetic unit asso- 
ciated with each processing element for per- 
forming computations, and having next-neigh- 
bor communications between said processing 
elements, wherein said processor performs 
computations to implement said filter by using 
said processing elements as taps of said filter 
function, such that said data samples are out- 



15 



EP 0 444 368 A1 



30 



put from said processing elements in an or- 
dered sequence for display after processing, 
and has a processor control unit for providing 
timing and control signals to said processor, 
and has an instruction generator for providing 5 
instructions to said processor that determine 
execution of said computations. 

19. A two-processor delay method of implement- 
ing a horizontal filter with a single-instruction 10 
multiple-data processor, comprising the steps 

of: 

(1) loading an input register of said proces- 
sor with data samples representing a line of 
data, wherein processing elements of said is 
processor correspond to said samples; 

(2) adding each sample value n to a sample 
value n-1 to obtain a first sum; 

(3) adding said first sum to the results of 
step (2) for sample n-1 to obtain a second 20 
sum; 

(4) adding said second sum to the results of 
step (3) for sample n-2 to obtain a third 
sum; 

(4) adding the results of step (4) for sam- 25 
pies n-1 and n-2 to obtain a fourth sum; 

(5) adding said third sum to said fourth sum 
to obtain a fifth sum; and 

(6) multiplying said fifth sum by a predeter- 
mined constant value. 30 

20. A one-processor delay method of implement- 
ing a horizontal filter with a single-instruction 
multiple-data processor, comprising the steps 

of. 35 

(1) loading an input register of said proces- 
sor with data samples representing a line of 
data; 

(2) adding each sample value n-1 to a sam- 
ple vaiue n-2 to obtain a first sum; 40 

(3) adding each sample value n-1 to said 
first sum to obtain a second sum; 

(4) adding each sample value n to said 
second sum to obtain a third sum; and 

(5) multiplying said third sum by a predeter- 45 
mined constant. 
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